Search CORE

84 research outputs found

Generalized Max Pooling

Author: Murray Naila
Perronnin Florent
Publication venue
Publication date: 01/01/2014
Field of study

State-of-the-art patch-based image representations involve a pooling operation that aggregates statistics computed from local descriptors. Standard pooling operations include sum- and max-pooling. Sum-pooling lacks discriminability because the resulting representation is strongly influenced by frequent yet often uninformative descriptors, but only weakly influenced by rare yet potentially highly-informative ones. Max-pooling equalizes the influence of frequent and rare descriptors but is only applicable to representations that rely on count statistics, such as the bag-of-visual-words (BOV) and its soft- and sparse-coding extensions. We propose a novel pooling mechanism that achieves the same effect as max-pooling but is applicable beyond the BOV and especially to the state-of-the-art Fisher Vector -- hence the name Generalized Max Pooling (GMP). It involves equalizing the similarity between each patch and the pooled representation, which is shown to be equivalent to re-weighting the per-patch statistics. We show on five public image classification benchmarks that the proposed GMP can lead to significant performance gains with respect to heuristic alternatives.Comment: (to appear) CVPR 2014 - IEEE Conference on Computer Vision & Pattern Recognition (2014

arXiv.org e-Print Archive

CiteSeerX

Crossref

Deep Fishing: Gradient Features from Deep Nets

Author: Gaidon Adrien
Gordo Albert
Perronnin Florent
Publication venue
Publication date: 01/01/2015
Field of study

Convolutional Networks (ConvNets) have recently improved image recognition performance thanks to end-to-end learning of deep feed-forward models from raw pixels. Deep learning is a marked departure from the previous state of the art, the Fisher Vector (FV), which relied on gradient-based encoding of local hand-crafted features. In this paper, we discuss a novel connection between these two approaches. First, we show that one can derive gradient representations from ConvNets in a similar fashion to the FV. Second, we show that this gradient representation actually corresponds to a structured matrix that allows for efficient similarity computation. We experimentally study the benefits of transferring this representation over the outputs of ConvNet layers, and find consistent improvements on the Pascal VOC 2007 and 2012 datasets.Comment: To appear at BMVC 201

arXiv.org e-Print Archive

Crossref

Handwritten word-image retrieval with synthesized typed queries

Author: Florent Perronnin
José A. Rodríguez-serrano
Publication venue
Publication date: 01/01/2009
Field of study

We propose a new method for handwritten word-spotting which does not require prior training or gathering examples for querying. More precisely, a model is trained “on the fly ” with images rendered from the searched words in one or multiple computer fonts. To reduce the mismatch between the typed-text prototypes and the candidate handwritten images, we make use of: (i) local gradient histogram (LGH) features, which were shown to model word shapes robustly, and (ii) semi-continuous hidden Markov models (SC-HMM), in which the typed-text models are constrained to a “vocabulary ” of handwritten shapes, thus learning a link between both types of data. Experiments show that the proposed method is effective in retrieving handwritten words, and the comparison to alternative methods reveals that the contribution of both the LGH features and the SC-HMM is crucial. To the best of the authors ’ knowledge, this is the first work to address this issue in a non-trivial manner

CiteSeerX

Crossref

RODRIGUEZ-SERRANO, PERRONNIN: LABEL EMBEDDING FOR TEXT RECOGNITION 1 Label embedding for text recognition

Author: Florent Perronnin
Jose A. Rodriguez-serrano
Publication venue
Publication date
Field of study

The standard approach to recognizing text in images consists in first classifying local image regions into candidate characters and then combining them with high-level word models such as conditional random fields (CRF). This paper explores a new paradigm that departs from this bottom-up view. We propose to embed word labels and word images into a common Euclidean space. Given a word image to be recognized, the text recognition problem is cast as one of retrieval: find the closest word label in this space. This common space is learned using the Structured SVM (SSVM) framework by enforcing matching label-image pairs to be closer than non-matching pairs. This method presents the following advantages: it does not require costly pre- or post-processing operations, it allows for the recognition of never-seen-before words and the recognition process is efficient. Experiments are performed on two challenging datasets (one of license plates and one of scene text) and show that the proposed method is competitive with standard bottom-up approaches to text recognition. 1 Introduction and related wor

CiteSeerX

Selection itérative de transformations pour la classification d'images

Author: Harchaoui Zaid
Paulin Mattis
Perronnin Florent
Revaud Jérôme
Schmid Cordelia
Publication venue: HAL CCSD
Publication date: 30/06/2014
Field of study

National audienceEn classification d'images, une stratégie efficace pour apprendre un classifieur invariant à certaines transformations consiste à augmenter l'échantillon d'apprentissage par le même ensemble d'exemples mais auxquels les transformations ont été appliquées. Néanmoins, lorsque l'ensemble des transformations possibles est grand, il peut s'avérer difficile de sélectionner un petit nombre de transformations pertinentes parmi elles tout en conservant une taille d'échantillon d'apprentissage raisonnable. optimal. En effet, toutes les transformations n'apportent pas le même impact sur la performance ; certains peuvent même dégrader la performance. Nous proposons un algorithme de sélection automatique de transformations : à chaque itération, la transformation qui donne le plus grand gain en performance est sélectionnée. Nous évaluons notre approche sur les images de la compétition ImageNet 2010 et améliorons la performance en top-5 accuracy de 70.1% à 74.9%

Hal - Université Grenoble Alpes

An introduction to biometrics audio and video-based person authentification

Author: DUGELAY (Jean-Luc)
PERRONNIN (Florent)
Publication venue: GRETSI, Saint Martin d'Hères, France
Publication date: 01/01/2002
Field of study

Biometrics, which refers to identifying an individual based on his/her physical or behavioral characteristics, has gained in popularity among researchers in signal processing during recent years. It has also focused the attention of medias since the tragic events of September 11th, 2001. We first introduce the notion of biometrics. Then, we describe the architecture of biometric systems and the metrics used to evaluate their performances. We briefly discuss the most common biometrics and the different ways to combine them to obtain multimodal systems. Finally, we present applications of biometrics.La biométrie, qui consiste à identifier un individu à partir de ses caractéristiques physiques ou comportementales, connaît depuis quelques années un renouveau spectaculaire dans la communauté du traitement du signal. Elle a aussi reçu une attention accrue de la part des médias depuis les tragiques événements du 11 septembre 2001. Dans cet article nous introduisons tout d'abord la notion de biométrie. Nous décrivons l'architecture d'un système biométrique ainsi que les métriques utilisées pour évaluer leur performance. Nous donnons un bref aperçu des technologies biométriques les plus courantes et des moyens de les fusionner pour obtenir des systèmes multimodaux. Nous présentons enfin les applications possibles de la biométrie

I-Revues

Selection itérative de transformations pour la classification d'images

Author: Harchaoui Zaid
Paulin Mattis
Perronnin Florent
Revaud Jérôme
Schmid Cordelia
Publication venue: HAL CCSD
Publication date: 30/06/2014
Field of study

Hal - Université Grenoble Alpes

INRIA a CCSD electronic archive server